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Abstract 

Dynamical modelling lies at the heart of our understanding of 
physical systems. Its role in science is deeper than mere operational 
forecasting, in that it allows us to evaluate the adequacy of the math- 
ematical structure of our models. Despite the importance of model 
parameters, there is no general method of parameter estimation out- 
side linear systems. A new relatively simple method of parameter 
estimation for nonlinear systems is presented, based on variations in 
the accuracy of probability forecasts. It is illustrated on the Logistic 
Map, the Henon Map and the 12-D Lorenz96 flow, and its ability to 
outperform linear least squares in these systems is explored at various 
noise levels and sampling rates. As expected, it is more effective when 
the forecast error distributions are non-Gaussian. The new method 
selects parameter values by minimizing a proper, local skill score for 
continuous probability forecasts as a function of the parameter values. 
This new approach is easier to implement in practice than alternative 
nonlinear methods based on the geometry of attractors or the ability 
of the model to shadow the observations. New direct measures of in- 
adequacy in the model, the "Implied Ignorance" and the information 
deficit are introduced. 
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The estimation of physical constants (parameters) plays a central role 
in the physical sciences. Yet there is no general method of parameter es- 
timation for nonlinear dynamical systems pQ. In what may have been one 
early use of least squares (see [2j |3]), Gauss [I] predicted where the newly 
discovered Ceres would appear as it emerged from behind the sun. The 
prediction involved both a parameter (Newton's gravitational constant) and 
initial conditions (brief observations of Ceres before occultation) . This suc- 
cess, where other methods failed, supported both the least squares approach 
and the mathematical form of Newton's Laws. A new Minimum Ignorance 
(MI) approach to parameter estimation for use in dynamical systems is pre- 
sented and illustrated in nonlinear cases where the common "least squares" 
approaches can be systematically biased. While the focus is on cases in which 
the data archive is relatively large and the model structure is correct 0, the 
approach may prove useful outside this Perfect Model Scenario [7j. In the MI 
approach, many large sets of probability forecasts are made, each using dif- 
ferent parameter values; the quality of those parameter values is determined 
by the quality of the corresponding set of probability forecasts. A measure of 
the internal consistency of probability forecasts is also introduced, providing 
quantitative insight into modelling inadequacy. 

Parameter estimation for deterministic nonlinear models poses several 
challenges, as nonlinear processes can be sensitive to initial conditions and 
parameter specifications. Traditional methods, like least squares, are sub- 
optimal when forecast errors are non-Gaussian, even if the observational un- 
certainties are normally distributed. One aim of this paper is to stress that 
fact given the common, and often unguarded, use of least squares. Several 
methods have been proposed to address the shortcomings of traditional meth- 
ods: McSharry and Smith [5] estimate model parameters by incorporating the 
global behaviour of the model into the selection criteria; Creveling et al [H], 
Maybhate and Amritkar [10] have exploited synchronisation for parameter es- 
timation; Smith et al. [H] focused on the geometric properties of trajectories; 
Heald and Stark [12] include estimation of the noise model. Recently Quinn 
and Abarbanel [13] have demonstrated parameter and state can be estimated 
via evaluation of a discrete time path integral in model state space. They also 
note applications in a number of fields including neurobiology, atmospheric 
and oceanic sciences, cell biology, chemical engineering, wastewater treat- 

1 Specifically, cases where there is a parameter value for which the mathematical model 
is empirically adequate. The motion of Mercury is inconsistent with the mathematical form 
of Newton's Laws, an internally consistent description requires General Relativity. It is 
not that the value of a parameter in Newton's Laws is uncertain, but rather the value is 
indeterminate [5]: no value will yield results consistent with observed planetary motion. 
In the "Perfect Model Scenario" , the "True" parameter is unknown, but does exist. [5] 
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ment and biochemistry. There are also variational approaches [2], multiple 
shooting methods [T3] and sequential methods based loosely on the Kalman 
Filter [15j H2J H3 EE] • Several of these alternative approaches are contrasted 
with the MI approach in the conclusion section. 

Results of MI parameter estimation are presented and critically examined 
for three chaotic systems: the Logistic Map, the Henon Map and the 12- 
D Lorenz96 flow. MI is shown to outperform linear least squares in these 
systems when the nonlinearities are relevant; at small lead times and low 
noise levels MI and LS are comparable. The MI method does not solve the 
problem of nonlinear parameter estimation completely, but it does highlight 
the failure of common linear methods and allow significant progress in some 
nonlinear cases, progress which may generalize beyond the Perfect Model 
Scenario. 

Technical Problem Statement 

Parameter estimation is a ubiquitous problem in scientific modelling. [U 
El El El H2J [131 GEJ ED] While well understood in linear systems [2D 1221 E3] , 

challenges remain in nonlinear systems [21]. Discussions of parameter esti- 
mation typically assume: dynamical systems are linear (or can be linearised), 
the mathematical structure of the model is perfect (thus "True" parameter 
values exist) and that the statistics of observational uncertainty are known 
(the "noise model" is perfect). A more complete discussion is provided by 
Tarantola [21] who in Figure 3.2 sketches six schematic examples, four that 
are linear or linearisable, one requiring fully nonlinear methods, and one too 
complex for his methods to be used. Problems of the fifth category in the 
context of prediction, the so-called "forward problem" , are approached here. 

Assume the evolution of a system state Xj e M. m is governed by finite 
dimensional, discrete time, deterministic nonlinear dynamical system: 

x i+ i = F(xi,a), (1) 

where x e IR m and the model's parameters are contained in the vector aGK 1 . 
For m — 1, the state scalar. For simplicity forecasts are evaluated on a 

scalar observation below, even when m > 1. Assuming additive measurement 
noise Si yields observations Sj = Xi+Si. A set of Z+l sequential measurements 
Sj+i, ...,Si+i would, in general, be sufficient to determine a in a noise free 
setting (i.e. Si = Vz) [Sj. With noise, the task is somewhat harder. 

Given model structure F(x,a) and observations generated by particular 
parameter a (the "True" parameter value), one can identify values for a 
consistent with the available information. Parameter estimates are made on 
the basis of the skill of the probability forecast. To ease comparison with 
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previous work [TTJ, the approach is illustrated using 3 nonlinear models: 
The 1-D Logistic Map: 

F(x,a) = l-ax 2 (2) 

and the 2-D Henon Map: 

x i+1 = 1 - ax\ + yi 

y i+ i = bxi (3) 

and a 12-D Lorenz96 flow |25|. 



Minimum Ignorance Parameter Estimation 

The least squares (LS) method estimates parameters by minimising the root 
mean square error of a point forecast. Even given infinite data, the optimal 
LS solution is biased when applied to the Logistic Map [5]. The LS method 
fails because the assumption of Independent Normal Distributed (IND) fore- 
cast errors does not hold, even with IND observational noise. This is to be 
expected in nonlinear models. 

A point value based on an imperfectly observed initial state is incomplete 
as a forecast [2B] ; given observational uncertainty, an ensemble of initial states 
of the system consistent with given observations [27] is required to propagate 
this initial uncertainty, suggesting probabilistic forecasts via Monte Carlo 
ensembles. 

Scoring Probabilistic Forecasts 

A probabilistic skill score is a function S(p(y), Y), where Y is the outcome 
and p(y) is a probability forecast J2S]. The Ignorance Score [2HJ ED] is given 
by: 

S(p(y),Y) = -log 2 (p(Y)) (4) 

Ignorance is the only proper local score for continuous variables [311 E2]. In 
practice, given N forecast-outcome pairs (pi(y),Y i ,i = 1, ...,N), the Empiri- 
cal Ignorance is: 

1 N 

S EI (p(y), Y) = jjJ2 - l °92(Pi{Yi)) ~ S^, (5) 

where S c i im is defined using the unconditional probability or "climatology" 
of y, denoted p c (y)', this is simply the natural measure projected onto the 
forecast variable. The zero skill of Ignorance is then 

Sdim = / -Pc(,y)log 2 (p c (y))dy (6) 
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While other proper skill scores might be used in this context, Ignorance 
is the only proper local skill score for continuous variables; it is invariant 
under smooth changes of coordinates. Insuring these properties is desirable 
in parameter estimation. 

Ensembles and Probability Forecasting 

An ensemble forecast is based on a collection of simulations simultane- 
ously. There are many methods for forming an ensemble of initial states [331 
SUES]- Perhaps the simplest method is to add draws from the inverse of the 
observational noise to the observation to define ensemble members. In that 
case ensemble members are equally weighted, as each ensemble member is 
an independent draw. With this Inverse Noise method, the initial states are 
unlikely to be consistent with the long term model dynamics (e.g. they are 
not "on the attractor" should one exist). 

Continuous forecast distributions can be produced from an ensemble by 
kernel dressing its members. Standard kernel dressing is used below (see [28j 
130] for more details, and [36] for a Bayesian approach). Define an N e member 
ensemble at time % to be Xj = [x\, ...,x^ R \ and treat all ensemble members 
as exchangeable: the ensemble interpretation methods used do not depend 
on the ordering of the ensemble members ^ZE\. Standard kernel dressing 
transforms the ensemble members into a probability density function p m 
where: 

1 Ne f — A 

Pmfa:A>H _g A -(^). (7) 

In this case the forecast distribution is a sum of Gaussian kernels K(-), the j th 
ensemble member being replaced by kernel centred at xK For each value of 
a, the kernel width, k, is chosen to minimise the Empirical Ignorance defined 
in equation O above [28]. There remains the chance that the verification lies 
outside the range of any finite ensemble, even if the verification is selected 
from the same distribution as the ensemble itself; the probability of this 
happening is > jj-. Given the nonlinearity of the model, these points may 
be very far from the ensemble, and appear as "outliers" or "bad busts". 

Given a sample climatology of the system from historical data, prob- 
abilistic forecasts may be improved out-of-sample by blending the dressed 
ensemble with the sample climatology [28], thereby allowing both narrower 
kernels and fewer bad busts. Blending with climatology yields the forecast 
distribution: 

P{y) = ap m {y) + (1 - a)p c (y) (8) 
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where p m reflects the ensemble and p c the climatology. The probability fore- 
cast obtained will be a function of a. Values of a with small Empirical 
Ignorance are deemed better. 

Comparing forecast performance of different models is not a fair com- 
parison without blending climatology. It might be the case that, without 
blending climatology Model A outscores Model B, while after blending cli- 
matology Model B scores higher than Model A. Since the sample climatology 
is available to any model, the comparison should include this information. 

Evaluation and Results 



Figures [T] and [2] show the Empirical Ignorance scores as a function of lead 
time, r, and parameter value a for the Logistic Map and for the Henon Map. 
Figure [T] shows five different noise levels a, for two lead times. In panel (a) 
t = 1 and panel (b) r = 4. The vertical line marks the "True" parameter 
value of 1.85, Figure [2] reports results from the Henon map showing how 
MI approach outperforms a LS method. The '+' in each panel reflects the 
"True" values of a and b. Panel (a) shows the inferiority of the LS error. 
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Figure 1: (Color) Minimum Ignorance parameter estimation for Logistic Map 
with a = 1.85; initial condition ensembles are formed by Inverse Noise. Five 
different noise levels are tested, each given 1024 forecasts; (a) Ignorance as a 
function of a for r = 1, the minima are marked with an "x"; (b) for r = 4. 



Returning to Figure [U^a), note the bias away from the "True" value. MI 
estimates at longer r (Figure QJb)) tend to provide less biased estimates. 
The small r bias is due to imperfections in the initial ensemble: neither the 
observation itself nor the initial ensemble formed by Inverse Noise are consis- 
tent with the long time dynamics. The natural measure of the Logistic Map 
is not uniform; for some parameter values it may be fractal. A dynamically 
consistent ensemble is an ensemble of initial conditions which are not only 
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Figure 2: (Color) Parameter estimation for Henon map (a=1.4; b=0.3), noise 
level=0.05, given 1024 forecasts at lead time 4. (a) a cost function based on 
LS, (b) a cost function based on forecast Ignorance. 



consistent with the observational noise, but also consistent with the natural 
measure. D Figure [3] shows using a more dynamically consistent ensemble 
of initial conditions (in this case merely consistent with the current obser- 
vation H) produces less biased results at short r and also improves larger 

T. 
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Figure 3: (Color online) Parameter estimation for Logistic Map with a = 1.85 
using dynamically consistent ensembles. Contrast the (improved) Ignorance 
value relative to Figure [1] where the same lead times and noise level are used. 



2 Note for a structurally perfect model, the dynamically consistent ensemble will ap- 
proach a perfect ensemble [57] at the "True" parameter value when a long window of 
observations is considered. 

3 Here the dynamically consistent ensemble of initial conditions are only consistent with 
the current observations; requiring consistency with a series of observations would result in 
more informative ensembles. To locate states also consistent with more past observations 
are much more costly. 
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To demonstrate that the advantage of MI parameter estimation need 
not vanish in higher dimensional systems, the single parameter Lorenz96 
system [25J is considered with m = 12 and the parameter F = 17, using 
Inverse Noise ensembles. Nonlinear effects are reduced at smaller lead times 
r and lower noise levels a. In Lorenz96, the estimation error of LS and MI 
are roughly the same with r = 0.5 and a = 0.1 (this is ~ 0.2% of the range 
of the data). Increasing the noise level to a = 1, the estimation error from 
LS is ~ 8 times that of MI. Alternatively keeping a = 0.1 and increasing to 
t = 1 yields an error in the LS estimate ~ 3 times larger. For any smooth 
F(x) the linear approximation will hold in the limit of infinitesimally small 
observational noise; even in this limit MI estimation will outperform linear 
methods which, like variants of the Kalman filter fail, to respect the natural 
measure of F. 

Imperfect Model Scenario 

In the statistics literature, parameters within the Perfect Model Scenario, 
where a "True" value is thought to exist but is unknown, are sometimes 
referred to as "quantities with a well defined physical meaning." (see for 
instance, [IS]). Here the distinction is made between fitting parameters in 
a "physical model" and a "curve fitting model", where in the second case 
parameters are defined only relative to some goal. As will be demonstrated 
below, if the mathematical structure of the model is imperfect there is no 
unique value of the parameter is "optimal", the "best" parameter may vary 
with application (the lead time of the forecast, for example). Within Perfect 
Model Scenario the "True" or optimal parameter value exists but is unknown, 
outside Perfect Model Scenario there it is not unknown but undefined, one 
is dealing not with uncertainty but ambiguity [5]. 

All analysis techniques including LS are limited to exploring the informa- 
tion contained in the data; large forecast-outcome archives and lower observa- 
tional noise levels contain more information and thus allow better parameter 
estimates when the model structure is perfect. When the model class does 
not admit an empirically adequate model, the notion of a "True" parameter 
value is lost. The MI approach remains useful for identifying best parameter 
in an imperfect model if a notation of "best" is defined in terms of forecast 
performance. 

Next consider a system-model pair in the Imperfect Model Scenario. The 
Quartic system is defined as 

41 

G(x) = 2((1 - l)x(1 -x) + —x(1 - 2x 2 + x 3 )). (9) 

5 
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The model in this case is 

G(x) = ax(l-x), (10) 

which is just the Logistic Map in another form [37]. At I = the model 
is perfect (as Z — >■ the model has structural error); I — 0.1 is considered 
here. Given the observations generated by the system with additive noise, 
the goal is to estimate the parameter of the (imperfect) model. Figure 11(a) 
shows the Empirical Ignorance scores as a function of parameter value for 
Logistic model at lead time 1, following Figure[T]five different noise levels are 
examined. Figure H(b) the noise level is fixed while 5 different lead times are 
examined. Note the dashed black line reflects the parameter a used in the 
Quartic system which need no longer be the target of the model parameter 
value a. Results for three independent experiments are shown, indicating 
that the bias away from the system parameter value is robust. Figure H] 
shows the MI estimate varies with lead time and noise level. In both cases 
the notation of "best" is defined in terms of forecast skill given an Inverse 
Noise initial condition ensemble. The system parameter value a is not equal 
to the best model parameter value a. 




Figure 4: (Color online) Parameter estimation for Logistic model in the 
Imperfect Model Scenario, with parameter a = 4 of Quartic system, using 
Inverse Noise ensembles. Results from three independent realizations are 
shown, each given 1024 forecasts; note consistency in locating the minimum 
(x). The similarity of these three lines indicates the result is robust, a) 
Empirical Ignorance scores as a function of the parameter value for lead time 
1 forecast at several noise level; b) Empirical Ignorance scores as a function 
of the parameter value and lead time given Noise Level= 1/128. 



Further Discussion 
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Minimum Ignorance parameter estimation considers the entire forecasting 
scenario; once the notion of "best" is defined any alteration of the forecast- 
ing scheme may alter the best parameter value. In this section, effects of 
ensemble formation and kernel dressing are discussed, and an alternative to 
"potential predictability" is suggested. 

The variance of the standard kernel dressed ensemble is of course al- 
ways larger than the variance of the raw ensemble, no matter how the kernel 
width is actually determined [38]. More complicated dressing methods exist; 
(Brocker and Smith [28] for example introduced an improved kernel dressing, 
called "affine kernel dressing", that is more flexible and robust). Standard 
kernel dressing is used here as it is straightforward to understand, easier to 
implement, and fit for our purpose. More advanced data assimilation meth- 
ods may yield more informative ensembles (for example Indistinguishable 
States [M] . Monte Carlo methods [13j [33]). If it is costly to run the model 
(as with weather/climate models), Inverse Noise provides a much faster and 
cheaper first-pass estimate. There are also alternative low cost distributions 
one can use to blend with the dressed ensemble forecast other than the un- 
conditional climatology, for example a dynamical climatology ensemble based 
on analogues to the current state (see the discussion of eRAP in [27]). The 
MI approach generalises beyond estimating "physical" parameters as it can 
be used for structural parameters as, for example, in delay space reconstruc- 
tions (see Farmer and Sidorowich [39J and citations thereof) and model re- 
duction [40J. Finally, note that it is also possible to estimate the parameters 
of the noise model(s) [12] within the MI framework. 

"Potential predictability" reflects the utility an existing forecast system 
would have if it were perfect [JT]. Interpreting this as utility carries some 
risk, of course as the actual system may be much more predictable (or much 
less) than the dynamics of the current generation of models. An alternative 
approach which can quantify the (historical) impact of model inadequacy is 
to contrast the Empirical Ignorance with the Implied Ignorance, defined as 



The Implied Ignorance is the Ignorance one would expect to observe if in fact 
the probability forecast was perfect. The difference between Empirical Igno- 
rance and Implied Ignorance reveals an information deficit (in bits), which 
exposes shortcomings anywhere in the forecast methodology. In contrast 
with the so called "estimate" of skill from "potential predictability" experi- 
ments which assumes the model is perfect, the information deficit quantifies 
just how far the predictability of the current model is from (its internal) 
perfection. Within the Perfect Model Scenario, the Implied Ignorance can 
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approach the Empirical Ignorance for the "True" parameter values (if and 
only if the entire ensemble forecasting package is perfect). Even when the 
model structure is mathematically correct, the Empirical Ignorance may be 
greater than the Implied Ignorance, indicating that the model PDF is an 
incomplete reflection of the expected uncertaintiesP Indeed the information 
deficit provides quantitative information on second order uncertainty. Fig- 
ure illustrates this in both the perfect model case (Figure [S^a) and (c)) and 
the imperfect model case (Figure [5(b) and (d)). In the perfect model case, 
the Empirical Ignorance and Implied Ignorance should converge to within 
sampling error at the "True" parameter when a (many step) dynamically 
consistent ensemble is employed. In Figure |5^a) and (b), the upper blue line 
shows the Empirical Ignorance for Inverse Noise ensemble, the lower green 
line shows the Empirical Ignorance for a (one step) dynamically consistent 
ensemble. The upper red line and lower purple line correspond to the Implied 
Ignorance for each ensemble formation strategy respectively. Figure [5]^c) and 
(d) show the information deficits correspond to Figure [5](a) and (b). The 
information deficit will remain nonzero as long as Inverse Noise is used. In 
Figure E^a) and (c) the information deficit for dynamically consistent en- 
sembles remains nonzero because these dynamically consistent ensembles are 
only consistent with one observation, as the window of dynamically consis- 
tency increases and the ensemble size increases the information deficit will 
approach zero. On the other hand, in the imperfect model case (Figure E^b) 
and (d)) the information deficit will remain nonzero no matter what one may 
do due to the model inadequacy. 

Also note in Figure [5] that the information deficit of Inverse Noise en- 
semble is smaller than that of dynamically consistent ensemble. This is 
somewhat misleading, in the same way that potential predictability is con- 
sistently misleading. Confusion can be avoided by noting that forecasts using 
the (one step) dynamically consistent ensemble provide almost 2 bits more 
information beyond those from the Inverse Noise ensembles. 

Conclusion 

Although widely popular, the method of LS is optimal only in a narrow 
context, a fact stressed by Kalman [12]; LS is often applied well outside its 
mathematical remit. While a general account of parameter estimation re- 
mains lacking, the straightforward Minimum Ignorance approach introduced 
here is shown to yield good parameter estimation in several chaotic systems. 
Initial experiments suggest that the MI approach is also useful for identifying 
best parameter in an imperfect model as long as the notion of "best" is well 
defined. 
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Figure 5: (Color) Empirical Ignorance and Implied Ignorance as a function 
of parameter value with noise level a = 1/128 for lead time 1. Curves for 
both Inverse Noise ensemble and Dynamically consistent ensemble. 1024 
forecasts are considered in each ) the Perfect Model Scenario with 

the Logistic Map: F(x, a) = 1 — ax 2 , (b) the Imperfect Model Scenario with 
system-model pair of Equation [9] & [TQl (c) information deficit in the Perfect 
Model Scenario, (d) information deficit in the Imperfect Model Scenario. 

MI is expected to perform well against the myriad of modern alternatives. 
The initial value approach [14] is reminiscent of four- dimensional variational 
assimilation (43] (4DVAR), minimizing the cost function not only for the ini- 
tial condition but also for the parameter values; like 4DVAR it is computa- 
tionally expensive and suffers from local minima. These variational methods 
differ from the LS method, inasmuch as LS fails fundamentally while the ini- 
tial value approach fails numerically. Simply put, the root of this failure lies 
in chaotic likelihoods [44]. Voss [15] applied a multiple shooting method to 
address the local minima problem in initial value approach; an initial value 
approach in short windows, resembling a similar spin up procedure applied 
to 4DVAR [43]; the approach remains expensive and Voss's examples show 
varying success. MI might be considered as a useful pre-filter for the method 
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of [32]; even then that method requires ad hoc continuity constraints. 

Sequential (recursive) methods provide an alternative approach. Kalman 
filter methods are most often applied to state estimation, to estimate the 
parameter one may simply add the parameter vector to the state vector |15j . 
For weakly nonlinear systems the extended Kalman Filter [18] can be used. 
For strong nonlinear systems, Voss [15] introduced the unscented Kalman 
filter. Notice that for such of sequential methods, the parameter vector is 
allowed to evolve in time; Expectation Maximization (EM) algorithms [T6] 
account for this. Each of these methods perform better where the Gaus- 
sianity assumption holds more strongly. MI does not require any Gaussian 
constraints what so ever. 

MI parameter estimation also has the advantage that it is easy to use. 
Methods which contrast the natural measure of the model with the obser- 
vations [8] are significantly more complicated, and grow more so as the di- 
mensionality of the model increases. Alternative methods which contrast 
shadowing times of the model as a function of parameter values [11] are 
significantly more computationally expensive. MI estimation using Inverse 
Noise ensembles is straightforward to implement and relatively inexpensive 
computationally. It will fail to indicate the "True" parameter value when 
the ensemble is not distributed consistently with respect to the model's long 
term dynamics (natural measure), but the parameter value MI suggests will 
give better probabilistic forecasts than the "True" parameter value as long 
as the flawed ensemble formation scheme is used. Investing more in data as- 
similation is shown to improve parameter estimates. When the mathematical 
structure of the model is incommensurate with the structure of the system 
generating the observations, the ultimate goal of parameter estimation is un- 
clear. As illustrated above, the "optimal" parameter value may, for example, 
vary with lead time. In such cases, MI can still provide useful parameter 
estimation as long as the goal ( "optimal" ) is well defined. 

MI parameter estimation by ensemble prediction provides a useful new 
tool, avoiding the shortcomings of other approaches. The information deficit 
reflected in the Implied Ignorance can reveal forecast system inadequacies 
and quantify the predictability in a more informative manner than "potential 
predictability" does. We are optimistic that this framework will allow some 
progress outside the Perfect Model Scenario. 
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